THE PROBABLE ERROR OF A MEAN 

Introduction 

Any experiment may he regarded as forming an individual of a “population” of 
experiments which might he performed under the same conditions. A series of experi¬ 
ments is a sample drawn from this population. 

Now any series of experiments is only of value in so far as it enables us to form 
a judgment as to the statistical constants of the population to which the experiments 
belong. In a greater number of cases the question finally turns on the value of a mean, 
either directly, or as the mean difference between the two quantities. 

If the number of experiments be very large, we may have precise information as to 
the value of the mean, but if our sample be small, we have two sources of uncertainty: 
(1) owing to the “error of random sampling” the mean of our series of experiments 
deviates more or less widely from the mean of the population, and (2) the sample is not 
sufficiently large to determine what is the law of distribution of individuals. It is usual, 
however, to assume a normal distribution, because, in a very large number of cases, 
this gives an approximation so close that a small sample will give no real information 
as to the manner in which the population deviates from normality: since some law of 
distribution must he assumed it is better to work with a curve whose area and ordinates 
are tabled, and whose properties are well known. This assumption is accordingly made 
in the present paper, so that its conclusions are not strictly applicable to populations 
known not to be normally distributed; yet it appears probable that the deviation from 
normality must be very extreme to load to serious error. We are concerned here solely 
with the first of these two sources of uncertainty. 

The usual method of determining the probability that the mean of the population 
lies within a given distance of the mean of the sample is to assume a normal distribution 
about the mean of the sample with a standard deviation equal to 5 / y/n, where s is the 
standard deviation of the sample, and to use the tables of the probability integral. 

But, as we decrease the number of experiments, the value of the standard deviation 
found from the sample of experiments becomes itself subject to an increasing error, 
until judgments reached in this way may become altogether misleading. 

In routine work there are two ways of dealing with this difficulty: (1) an experi¬ 
ment may he repeated many times, until such a long series is obtained that the standard 
deviation is determined once and for all with sufficient accuracy. This value can then 
he used for subsequent shorter series of similar experiments. (2) Where experiments 
are done in duplicate in the natural course of the work, the mean square of the differ¬ 
ence between corresponding pairs is equal to the standard deviation of the population 
multiplied by y/2. We call thus combine together several series of experiments for 
the purpose of determining the standard deviation. Owing however to secular change, 
the value obtained is nearly always too low, successive experiments being positively 
correlated. 

There are other experiments, however, which cannot easily be repeated very often; 
in such cases it is sometimes necessary to judge of the certainty of the results from 
a very small sample, which itself affords the only indication of the variability. Some 
chemical, many biological, and most agricultural and large-scale experiments belong 
to this class, which has hitherto been almost outside the range of statistical inquiry. 
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Again, although it is well known that the method of using the normal curve is only 
trustworthy when the sample is “large”，no one has yet told us very clearly where the 
limit between “large” and “small” samples is to be drawn. 

The aim of the present paper is to determine the point at which we may use the 
tables of the probability integral in judging of the significance of the mean of a series of 
experiments, and to furnish alternative tables for use when the number of experiments 
is too few. 

The paper is divided into the following nine sections: 

I. The equation is determined of the curve which represents the frequency distribution 
of standard deviations of samples drawn from a normal population. 

II. There is shown to be no kind of correlation between the mean and the standard 
deviation of such a sample. 

III. The equation is determined of the curve representing the frequency distribution of 
a quantity 2 ：, which is obtained by dividing the distance between the mean of a sample 
and the mean of the population by the standard deviation of the sample. 

IV. The curve found in I is discussed. 

V. The curve found in III is discussed. 

VI. The two curves are compared with some actual distributions. 

VII. Tables of the curves found in III are given for samples of different size. 

VIII and IX. The tables are explained and some instances are given of their use. 

X. Conclusions. 

Section I 

Samples of n individuals are drawn out of a population distributed normally, to 
find an equation which shall represent the frequency of the standard deviations of these 
samples. 

If s be the standard deviation found from a sample X 1 X 2 • • >x n (all these being 
measured from the mean of the population), then 

^2 _ *5(^1) ( S"(ri)) 2 _ S(xl) S(xl) 2 S(xiX2) 

n \ n J n n 2 n 2 

Summing for all samples and dividing by the number of samples we get the moan 
value of s 2 , which we will write s 2 : 

_ 2 _ ㈣ 2 n[i 2 — /x 2 (n - 1) 

s — 9~ — 1 

n n z n 

where "2 is the second moment coefficient in the original normal distribution of x: 
since xi, X 2 , etc. are not correlated and the distribution is normal, products involving 
odd powers of xi vanish on summing, so that 2S ^ X 1 X2 ^ is equal to 0. 
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If M f r represent the Rth moment coefficient of the distribution of s 2 about the end 
of the range where s 2 — 0, 

n 

Again 



_ ^(^i) 2S(xlxl) _ 2S(Xf) _ 4S(xlxl) S(xj) 

n 2 n 2 n 3 n 3 n 4 

6*S(x?Xo) 

H - 2 - 1- other terms involving odd powers of x±, etc. which 

n 4 

will vanish on summation. 



and 

T,/,' _ ..4 ( n - !)( n + !)( n + 3 )( n + 5 ) 

■^4 — "2 • 

The law of formation of these moment coefficients appears to be a simple one, but 
I have not seen my way to a general proof. 
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If now Mr be the Rth moment coefficient of s 2 about its mean, we have 

M 2 = "2 ( n —^-{( n + 1) - (n - 1)} = 2/^ ( n —^ . 
n° n z 

A/r 3 f (n — l)(n + l)(n + 3) 3(n — 1) (2(n — 1) (n — l) 3 | 

M3=/i2 \ - ^ - n - 厂 } 

=^3 ( n ^) | n 2 + 4 n + 3 _ 6 n + 6 — n 2 + 2n — 1} = 8" “ n o 
n 6 n 6 

M\ = {(n — l)(n + l)(n + 3)(n + 5) — 32(n — l) 2 — 12(n — l) 3 — (n — l) 4 } 

= 也 11 7 l \n 3 + 9n 2 + 23n +15- 32n + 32- 12n 2 + 24n — 12 — n 3 + 3n 2 
n 4 

12"!(n — l)(n + 3) 
n 4 

Hence 

_ 8 M 4 _ 3(n + 3) 

^ 1 = m! = 如二兩二 

2p2 - 3 爲二 6 = -^—{G(n + 3) -24-6(n- 1)} = 0. 
n — 1 


Consequently a curve of Prof. Pearson’s Type III may he expected to fit the distri¬ 
bution of s 2 . 

The equation referred to an origin at the zero end of the curve will be 


y = Cx p e~^ x , 


where 


and 


M 2 4"!(n — l)n 3 n 

’ Ms 8n 2 "!(n-1) 2^2 

4 n — 1 1 n — 3 

— 1 = 丁 _1 = 丁 . 


Consequently the equation becomes 


n —3 _ nx 

y = Cx~^~e 兩 , 


which will give the distribution of s 2 . 

n Tt — 3 _2T/X_ 

The area of this curve is C J 0 x^~ e~^dx = I (say). The first moment coeffi¬ 
cient about the end of the range will therefore be 


Cj^x^e-^dx 


O . Y! — 1 Tl X 

(7-2^2 — -2^ 

x=oo 

n 

x=0 




I I I 

The first part vanishes at each limit and the second is equal to 

^ 2 / n -1 

- ； - = - M2. 


3n + 1} 
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and we see that the higher moment coefficients will he formed by multiplying succes¬ 
sively by etc., just as appeared to he the law of formation of M^ ， M3, 

M: ， etc. 

Hence it is probable that the curve found represents the theoretical distribution of 
s 2 ; so that although we have no actual proof we shall assume it to do so in what follows. 

The distribution of s may he found from this, since the frequency of s is equal to 
that of s 2 and all that we must do is to compress the base line suitably. 

Now if y\ = 0(5 2 ) be the frequency curve of 5 2 
and i/2 = ^(s) be the frequency curve of s, 

then 

yid{s 2 ) = y 2 ds, 
yids = 2y\sds, 

...U2 = 25yi. 



n 
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according n is even or odd. 
But /q is 


and I\ is 


-故 


e_ 2^ dx 


( 5 ) 


A 


Xe kigma^ (^X 




X = Ox 


Hence if n be even, 
A = 

while is n be odd 


Area 


A 


Area 


(n — 3 )(n — 5 )... 4.2 (^-) 
Hence the equation may be written 


cr 

n 


(n - 3 )(n - 5 ) … 3.1 ) ( 专） 2 


N 


-x n ~ 2 e~2^ (n even) 


(n 3 )(n - 5 )... 3 . 1 -y^ (|-) ( 吾） 


— 


or 


N 


y 


(n — 3 )(n — 5 )... 4.2 Vcr 2 / 
where N as usual represents the total frequency. 


O 2 x n ~ 2 e~^ (nodd) 


Section II 


To show that there is no correlation between (a) the distance of the mean of a 
sample from the mean of the population and (b) the standard deviation of a sample 
with normal distribution. 

( 1 ) Clearly positive and negative positions of the mean of the sample are equally 
likely, and hence there cannot be correlation between the absolute value of the distance 
of the mean from the mean of the population and the standard deviation, but (2) there 
might be correlation between the square of the distance and the square of the standard 
deviation. Let 


'你1)' 


and s 


S{xl) ( 5(xi)' 


Then if rrii ， M[ be the mean values of u 2 and s z ， we have by the preceding part 


M[ = fJ-2 — ——— and m [=—. 
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Now 

„ 2„2 S ( x l) (S{X X )\ 2 ，汛灼 )、 4 

us = ^r{^r) - 1 了 J 

_ f 5^?)\ 2 + ^S(xix 2 ).S(x\) _ S(xf) _ &S{x\xl) 

\ n J n 3 n 4 724 

— other terms of odd order which will vanish on summation. 

Summing for all values and dividing by the number of cases we get 

O , a — M 4 , 2 ( n — 1 ) M 4 Q 2 ( n — l) 

Ru 2 S 2 a u 2(J s 2 + m lMi = - h ^2 - 2 - 3/^2 - Q ~ ， 

722 n z ns n 6 

where R U 2 S 2 is the correlation between u 2 and s 2 . 

^ 9 (n — 1) 9 (n — 1), … 9 (n — 1) 

■^u 2 S2^u 2 ( ^s 2 "2 9 = "2 3 {3 H~ 几一 3 } = "2 o • 

n z n° n z 

Hence 开以 2 52 ( 7 以 2 ( 7 5 2 二 0 , or there is no correlation between u 2 and s 2 . 

Section III 


To find the equation representing the frequency distribution of the means of sam¬ 
ples of n drawn from a normal population, the mean being expressed in terms of the 
standard deviation of the sample. 

We have y = s n ~ 2 e as the equation representing the distribution of 5, 
the standard deviation of a sample of n, when the samples are drawn from a normal 
population with standard deviation s. 

Now the means of these samples of n are distributed according to the equation* 


y = 


\! (2咖 


and we have shown that there is no correlation between x, the distance of the mean of 
the sample, and s, the standard deviation of the sample. 

Now let us suppose x measured in terms of s, i.e. let us find the distribution of 

2 ： = xj s. 

If we have y\ = (j)(x) and 2/2 = 4 (z) as the equations representing the frequency 
of x and of 2 ： respectively, then 


Uidx 


dx 

= y 2 dz = y 3 一 , 
s 


V2 = syi. 


Hence 


y = 


Ny/{ri)s 

\/(2 丌 )a 


* Airy, Theory of Errors of Observations, Part II, §6. 
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is the equation representing the distribution of z for samples of n with standard devia¬ 
tion s. 

Now the chance that 5 lies between s and s ds is 

f ； +ds ^&rs^e-fds 

/ 0 °° ^=rs n - 2 e-^ds 

which represents the N in the above equation. 

Hence the distribution of 2 ： due to values of s which lie between s and s ds is 


rs-\-ds 


^\j ( 益 )’ 


s n ~ 1 e~ ：1 ^^ 1 ds 




•Tl — 2 f. 


ns2 

"2^ 


ds 




rS-\-ds 




n ns 2 (l + z 2 ) 

s n ~ 1 e ~272 - ds 


c n- 2 e -^ 


ds 


and summing for all values of s we have as an equation giving the distribution of 2 ： 


’( 忐 ) n 


q ns 2 (l + z 2 ) 

s n ~ 1 e 2^ ds 


f 0 °° ^s n ~ 2 e~^ds 


By what we have already proved this reduces to 


In — 2 n — 4 
2 n — 3 n — 5 


5 3 
4*2 


(1+2 2 ) - 


if n be odd 


and to 


1 n — 2 n — 4 

2 n — 3 n — 5 


4 2 
3*21 


a+z 2 ) 



if n be even 


Since this equation is independent of a it will give the distribution of the distance 
of the mean of a sample from the mean of the population expressed in terms of the 
standard deviation of the sample for any normal population. 


Section IV. Some Properties of the Standard 
Deviation Frequency Curve 


By a similar method to that adopted for finding the constant we may find the mean 
and moments: thus the mean is at I n -i/ In- 2 , 
which is equal to 


n — 2 n — 4 
n — 3 n — 5 



if n be even, 


or 


n — 2 n — 4 
n — 3 n — 5 



if n be odd . 


The second moment about the end of the range is 


In — (n - l)cr 2 
In-2 n 
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The third moment about the end of the range is equal to 


ln+1 ln+1 In — 1 


In¬ 


in—1 In- 


=a 2 x the mean. 

The fourth moment about the end of the range is equal to 

ln +2 (n — l)(n + 1) 4 

~T — 2 ^ . 

In-2 n z 

If we write the distance of the mean from the end of the range Dcr/y/n and the 
moments about the end of the range v\, U 2 , etc., 
then 


"l 


"2 


-^ 2 5 "3 


Da 3 

n ^/n ’ 

From this we get the moments about the mean: 


Da 


N 2 






M 2 = — (n - 1 - D 2 ), 
n 


^3 


y/n 


n 


{nD-S(n-l)D + 2D 2 } 


a 3 D 

Uyjn 


{2D 2 —2n + 3}, 


2 4 

IM = -rin 2 - 1 - 4D 2 n + 6(n - 1)D 2 - 3D 4 } = —r{n 2 - 1 - D 2 (3D 2 - 2n+ 6)1. 
n z n z 

It is of interest to find out what these become when n is large. 

In order to do this we must find out what is the value of D. 

Now Wallis’s expression for 丌 derived from the infinite product value of sin x is 

7r . 2 2 .4 2 .6 2 … （ 2n) 2 

2 (2n + ) = l 2 3 2 5 2 ...(2n- l) 2 ' 

If we assume a quantity ^ (= ao + ^ + etc.) which we may add to the 2n + 1 in 
order to make the expression approximate more rapidly to the truth, it is easy to show 


that 0 


"2 ^ 16^ 


—etc., and we get' 


1 ( 2n + ! + 丄） =■ 2 2 . 42 .6 2 … ( M 2 

2 V 2 16n ； 1 2 3 2 5 2 … （ 2n — l) 2 • 

From this we find that whether n be even or odd D 2 approximates to n — | 
when n is large. 

Substituting this value of D we get 


8n 


^2 


2n 


4n. 


M 2 




4n 2 


16n 2 ) 3(74 

， M 4 = 4^ 


1 


2n 16n 2 


'This expression will be found to give a much closer approximation to n than Wallis’s 
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Dugrim 1. Frequency Curve giving tbu Dutribution uf Slaiidard Devialioim of nau)plen 
of 10 takou frouu Normal Population 





Consequently the value of the standard deviation of a standard deviation which we 
have found ( , - g : I becomes the same as that found for the normal curve 

\^(2n)y/{l-(l/4n)}) 

by Prof. Pearson (cr/(2n)} when n is large enough to neglect the l/4n in comparison 
with 1. 

Neglecting terms of lower order than 1/n, we find 




2 77 . — 3 
n(4n — 3) ’ 


Ph 


H 1 —l 1+ ‘ 


Consequently, as n increases, /?2 very soon approaches the value 3 of the normal 
curve, but (3\ vanishes more slowly, so that the curve remains slightly skew. 

Diagram I shows the theoretical distribution of the standard deviations found from 
samples of 10. 

Section V. Some Properties of the Curve 


n — 2 n — 4 
n — 3 n — 5 


|.|ifnbeeven 

叠 . 暑 .| if n be odd 



n 


Writing z = tan 沒 the equation becomes y = ... etc. x cos n 0, which affords 

an easy way of drawing the curve. Also dz = dO/ cos 2 0. 
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Hence to find the area of the curve between any limits we must find 


n — 2 n — 4 


n — 3 n — 5 


... etc. x 


cos 


n—2 


OdO 


n — 2 n — 4 


n — 3 n — 5 
n — 2 n — 4 


… etc. 


n — 3 

Tl — 2 


cos n ~ 4 OdO + 


cos n ~ 3 esmO 


n — 3 n — 5 


... etc. I cos n_4 OdO + 


1 n — 4 
n — 3 n — 5 


2 


...etc. [cos n_3 沒 sin 沒 ] 


and by continuing the process the integral may he evaluated. 

For example, if we wish to find the area between 0 and 0 for n = 8 we have 


Area 


6 4 2 1 


cos 6 OdO 


4 2 


14 2 


丌 . 


r o 


cos OdO H —— cos 0 6 sin 0 
5 3 7T 


0 1 1 2 o 14 2 . 

—— I —— cos 0 sin 0 + cos 0 sin 0 + cos 沒 sin 沒 

7T 7T 3 7T 5 3 7T 


and it will be noticed that for n = 10 we shall merely have to add to this same expres¬ 
sion the term cos 7 9 sin 沒 . 

7 5 3 7r 

The tables at the end of the paper give the area between —oo and 2 ： 


|^or 0 = —— and 0 = tan -1 zj 


This is the same as 0.5 + the area between ^ = 0, and 0 = tan -1 2 ：, and as the 
whole area of the curve is equal to 1, the tables give the probability that the mean of 
the sample does not differ by more than z times the standard deviation of the sample 
from the mean of the population. 

The whole area of the curve is equal to 


n — 2 
n — 3 


r? — 4 f+2 7r ^ 

- ... etc. x / cos n — 2 OdO 

n-5 J_i n 


and since all the parts between the limits vanish at both limits this reduces to 1. 
Similarly, the second moment coefficient is equal to 

[ +i \os n - 2 e tan 2 9d9 
n-3 n-5 J_i 7r 

1 . ^ 

= ————— -...etc. x [ (cos n_4 0 — cos n_2 0)d0 
n-3 n-5 J_i 7r 

_ n-2 _ i _ 1 

n — 3 n — 3 
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V 0 A i 9 

Diigbax II. Solid carve y=j x-.-.|.-co6 tt ^ l 
Broken line com tie »>m»l ram irith tie lew atoW d«vi»ta 



Hence the standard deviation of the curve is \jyj (n — 3). The fourth moment 
coefficient is equal to 

————— -...etc. x [ cos n_2 ^tan 4 OdO 
n-3 n-5 7 -Itt 

r? — 2 n — A /*+^ 7r ^ 。 

= - . -… etc. x / (cos n_6 0 — 2 cos n_4 9 + cos n_2 9)d0 

n —3 n-5 J 

n — 2 n — 4 2(n — 2) ^ 3 

n — 3 n — 5 n — 3 + (n — 3)(n — 5) * 

The odd moments are of course zero, an the curve is symmetrical, so 


Pi 



/?2 = 


3(n - 3) 
n — 5 


= 3 + 


6 



Hence as it increases the curve approaches the normal curve whose standard devi¬ 
ation is 1/ y/(n — 3). 

/? 2 , however, is always greater than 3, indicating that large deviations are mere 
common than in the normal curve. 

I have tabled the area for the normal curve with standard deviation 1/ so as to 
compare, with my curve for n = 10 本 • It will be seen that odds laid according to either 


本 See p. 29 
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table would not seriously differ till we reach 2 ： = 0.8, where the odds are about 50 to 

1 that the mean is within that limit: beyond that the normal curve gives a false feeling 
of security, for example, according to the normal curve it is 99,986 to 14 (say 7000 to 
1) that the mean of the population lies between —00 and +1.35, whereas the real odds 
are only 99,819 to 181 (about 550 to 1). 

Now 50 to 1 corresponds to three times the probable error in the normal curve and 
for most purposes it would be considered significant; for this reason I have only tabled 
my curves for values of n not greater than 10, but have given the n = 9 and n = 10 
tables to one further place of decimals. They can he used as foundations for finding 
values for larger samples. $ 

The table for n = 2 can be readily constructed by looking out 6 = tan - 1 z in 
Chambers’s tables and then 0.5 + 0/n gives the corresponding value. 

Similarly | sin 0 + 0.5 gives the values when n = 3. 

There are two points of interest in the n = 2 curve. Here 5 is equal to half the 
distance between the two observations, tan -1 - = so that between +s and — 2 ： lies 

2 x ^ x ^ or half the probability, i.e. if two observations have been made and we have 
no other information, it is an even chance that the mean of the (normal) population will 
lie between them. On the other hand the second moment coefficient is 

— f tan 2 6d0 = — [tan 0 — ==-''oo, 

^ J=-=~2^ 

or the standard deviation is infinite while the probable error is finite. 

Section VI. Practical Test of the foregoing Equations 

Before I bad succeeded in solving my problem analytically, I had endeavoured to 
do so empirically. The material used was a correlation table containing the height and 
left middle finger measurements of 3000 criminals, from a paper by W. R. Macdonnell 
(Biometrika, I, p. 219). The measurements were written out on 3000 pieces of card¬ 
board, which were then very thoroughly shuffled and drawn at random. As each card 
was drawn its numbers were written down in a book, which thus contains the measure¬ 
ments of 3000 criminals in a random order. Finally, each consecutive set of 4 was taken 
as a sample ^一 750 in all — and the mean, standard deviation, and correlation^ of each 
sample determined. The difference between the mean of each sample and the mean of 
the population was then divided by the standard deviation of the sample, giving us the 
z of Section III. 

This provides us with two sets of 750 standard deviations and two sets of 750 
z’s on which to test the theoretical results arrived at. The height and left middle finger 
correlation table was chosen because the distribution of both was approximately normal 
and the correlation was fairly high. Both frequency curves, however, deviate slightly 
from normality, the constants being for height /3± = 0.0026, /?2 = 3.176, and for left 
middle finger lengths (3\ = 0.0030, P 2 = 3.140, and in consequence there is a tendency 

谷 E.g. if n = 11, to the corresponding value for n = 9, we add 吾 x 暑 x 是 cos 8 Osin 6: if 

n = 13 we add as well -^xlxfxlx^x^ cos 10 0 sin 6, and so on. 

^1 hope to publish the results of the correlation work shortly. 
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for a certain number of larger standard deviations to occur than if the distributions wore 
normal. This, however, appears to make very little difference to the distribution of z. 

Another thing which interferes with the comparison is the comparatively large 
groups in which the observations occur. The heights are arranged in 1 inch groups, 
the standard deviation being only 2.54 inches, while, the finger lengths wore originally 
grouped in millimetres, but unfortunately I did not at the time see the importance of 
having a smaller unit and condensed them into 2 millimetre groups, in terms of which 
the standard deviation is 2.74. 

Several curious results follow from taking samples of 4 from material disposed in 
such wide groups. The following points may be noticed: 

(1) The means only occur as multiples of 0.25. (2) The standard deviations occur 
as the square roots of the following types of numbers: n, n + 0.10, n + 0.25, n + 0.50, 
n + 0.69, 2n + 0.75. 

(3) A standard deviation belonging to one of these groups can only be associated 
with a mean of a particular kind; thus a standard deviation of V2 can only occur if the 
mean differs by a whole number from the group we take as origin, while \/l-69 will 
only occur when the mean is at n 土 0.25. 

(4) All the four individuals of the sample will occasionally come from the same 
group, giving a zero value for the standard deviation. Now this leads to an infinite 
value of 2 ： and is clearly due to too wide a grouping, for although two men may have 
the same height when measured by inches, yet the finer the measurements the more 
seldom will they he identical, till finally the chance that four men will have exactly 
the same height is infinitely small. If we had smaller grouping the zero values of the 
standard deviation might be expected to increase, and a similar consideration will show 
that the smaller values of the standard deviation would also be likely to increase, such 
as 0.436, when 3 fall in one group and 1 in an adjacent group, or 0.50 when 2 fall in 
two adjacent groups. On the other hand, when the individuals of the sample lie far 
apart, the argument of Sheppard’s correction will apply, the real value of the standard 
deviation being more likely to he smaller than that found owing to the frequency in any 
group being greater on the side nearer the mode. 

These two effects of grouping will tend to neutralize the effect on the mean value 
of the standard deviation, but both will increase the variability. 

Accordingly, we find that the mean value of the standard deviation is quite close to 
that calculated, while in each case the variability is sensibly greater. The fit of the curve 
is not good, both for this reason and because the frequency is not evenly distributed 
owing to effects (2) and (3) of grouping. On the other hand, the fit of the curve giving 
the frequency of z is very good, and as that is the only practical point the comparison 
may he considered satisfactory. 

The following are the figures for height: 


Mean value of standard deviations: 


Standard deviation of standard deviations: 


Calculated 

Observed 

Difference = 

Calculated 

Observed 

Difference 


2.027 士 0.02 
2.026 
- 0.001 

0.8558 ±0.015 
0.9066 
+0.0510 
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Comparison of Fit. Theoretical Equation: y = 


^/(2tt)ct 2 


2x2 


1 Scale in terms of standard deviations of population 







Calculated frequency 











li I0i 27 45i 

64 去 

78! 

87 88 

81 去 71 

58 

45 

33 

23 15 

4 

5! 7 

Observed frequency 

3 14i 24i 37i 

107 

67 

73 77 

77 § 64 

1 

2 

49| 

35 

28 12 | 

9 

11|7 

Difference 

+l| +4 -2! —8 +42i 


-14-11 

一 4 一 7 

_5 I 

+4| 

+2 +5 

_ 1 

+6 0 


Whence x 2 = 48.06, P = 0.00006 (about). 


In tabling the observed frequency, values between 0.0125 and 0.0875 were included 
in one group, while between 0.0875 and 0.012.5 they were divided over the two groups. 

As an instance of the irregularity due to grouping I may mention that there were 31 
cases of standard deviations 1.30 (in terms of the grouping) which is 0.5117 in terms 
of the standard deviation of the population, and they wore therefore divided over the 
groups 0.4 to 0.5 and 0.5 to 0.6. Had they all been counted in groups 0.5 to 0.6 % 2 
would have fallen to 20.85 and P would have risen to 0.03. The x 2 test presupposes 
random sampling from a frequency following the given law, but this we have not got 
owing to the interference of the grouping. 

When, however, we test the z’s where the grouping has not had so much effect, we 
find a close correspondence between the theory and the actual result. 

There were three cases of infinite values of 2 ： which, for the reasons given above, 
were given the next largest values which occurred, namely +6 or —6. The rest were 
divided into groups of 0.1; 0.04, 0.05 and 0.06, being divided between the two groups 
on either side. 

The calculated value for the standard deviation of the frequency curve was 1 (士 0.0171)， 
while the observed was 1.030. The value of the standard deviation is really infinite, as 
the fourth moment coefficient is infinite, but as we have arbitrarily limited the infinite 
cases we may take as an approximation 1 / V1500 from which the value of the probable 
error given above is obtained. The fit of the curve is as follows: 


Comparison of Fit. Theoretical Equation: y = cos 4 0, z = tan 6 


Scale of 之 

Calculated frequency 

5 9 音 134 34^ 44 去 78 去 

119 

141 

78 去 

44 去 

34 吾 134 

13 去 

9 去 5 

Observed frequency 

9 14| 11| 33 43 i 70 i 

119 * 

15li 

122 

67| 

49 26 i 

16 

10 6 

Difference 








+4 +4 —2 —2 -l\ -1 

-8 

+ 1 

+10* 

+3 

-11+4 辜 

-8 

+ 2 I +1 


Whence x 2 = 12.44, P = 0.56. 


This is very satisfactory, especially when we consider that as a rule observations are 
tested against curves fitted from the mean and one or more other moments of the obser¬ 
vations, so that considerable correspondence is only to ])c expected; while this curve 
is exposed to the full errors of random sampling, its constants having been calculated 
quite apart from the observations. 

The left middle finger samples show much the same features as those of the height, 
but as the grouping is not so large compared to the variability the curves fit the obser- 
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vations more closely. Diagrams III" and IV give the standard deviations of the z's for 
the set of samples. The results are as follows: 


Mean value of standard deviations: Calculated 

Observed 
Difference = 

Standard deviation of standard deviations: Calculated 

Observed 
Difference = 


2.186 士 0.023 
2.179 
—0.007 

0.9224 士 0.016 
0.9802 
+0.0578 


Comparison of Fit. Theoretical Equation: y : 


(2tt)ct 2 


2^ 2 


Scale in terms of standard deviations of population 
li 10i 27 45i 64i 78i 87 88 81! 71 

58 

45 

33 

23 15 

9| 


7 

Calculated frequency 

2 14 27i 51 64i 

91 

94| 

68 | 

65 i 73 

48| 

40| 

42| 

20 22 § 

12 

5 

n 

Observed frequency 
+ 2 + 3 2 +2 + 5 2 

+ 12| 

+7| 

一19| 

-16 +2 

一衿 


+9* 

-3+7| 

+2| 

_i 



Whence x 2 = 21.80, P = 0.19. 


Value of standard deviation: Calculated 1( 土 0.017) 

Observed 0.982 

Difference = —0.018 


Comparison of Fit. Theoretical Equation: y = cos 4 0, z = tan 6 
Scale of z 

Calculated frequency 


5 9| 13| 341 441 
Observed frequency 

78! 

119 141 119 

78| 

44| 

34! 

13| 


4 15! 18 33§ 44 

75 

122 138 120| 

71 

46| 

36 

11 

9 


Difference 


— 1 +6 +4 — — 1 —I —3 去 +3 — 3 — 7 — +2 — 2^ —| +1 

一 Whence x 2 = 7.39, P = 0.92. - " — 


A very close fit. 

We see then that if the distribution is approximately normal our theory gives us 
a satisfactory measure of the certainty to be derived from a small sample in both the 
cases we have tested; but we have an indication that a fine grouping is of advantage. If 
the distribution is not normal, the mean and the standard deviation of a sample will be 
positively correlated, so although both will have greater variability, yet they will tend 
to counteract one another, a mean deriving largely from the general mean tending to be 
divided by a larger standard deviation. Consequently, I believe that the table given in 
Section VII below may be used in estimating the degree of certainty arrived at by the 
mean of a few experiments, in the case of most laboratory or biological work where 
the distributions are as a rule of a “cocked hat” type and so sufficiently nearly normal 

II There are three small mistakes in plotting the observed values in Diagram III, which make the fit appear 
worse than it really is 
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Section VII. Tables of 铃 ㈣ 


f! 


n ODD 
n EVEN 



cos n_2 OdO FOR 


VALUES OF n FROM 4 TO 10 INCLUSIVE 
Together with 、 f-oo e —dx for comparison when n 


10 


-(=f) 

n 4 n 5 n 6 n 7 n 8 n 9 n 10 

For comparison | 



( V 7 

f ^ e -^ dx ) 




0.1 

0.5633 0.5745 0.5841 0.5928 0.6006 0.60787 0.61462 


0.60411 

0.2 

0.6241 0.6458 0.6634 0.6798 0.6936 0.70705 0.71846 


0.70159 

0.3 

0.6804 0.7096 0.7340 0.7549 0.7733 0.78961 0.80423 


0.78641 

0.4 

0.7309 0.7657 0.7939 0.8175 0.8376 0.85465 0.86970 


0.85520 

0.5 

0.7749 0.8131 0.8428 0.8667 0.8863 0.90251 0.91609 


0.90691 

0.6 

0.8125 0.8518 0.8813 0.9040 0.9218 0.93600 0.94732 


0.94375 

0.7 

0.8440 0.8830 0.9109 0.9314 0.9468 0.95851 0.96747 


0.96799 

0.8 

0.8701 0.9076 0.9332 0.9512 0.9640 0.97328 0.98007 


0.98253 

0.9 

0.8915 0.9269 0.9498 0.9652 0.9756 0.98279 0.98780 


0.99137 

1.0 

0.9092 0.9419 0.9622 0.9751 0.9834 0.98890 0.99252 


0.99820 

1.1 

0.9236 0.9537 0.9714 0.9821 0.9887 0.99280 0.99539 


0.99926 

1.2 

0.9354 0.9628 0.9782 0.9870 0.9922 0.99528 0.99713 


0.99971 

1.3 

0.9451 0.9700 0.9832 0.9905 0.9946 0.99688 0.99819 


0.99986 

1.4 

0.9451 0.9756 0.9870 0.9930 0.9962 0.99791 0.99885 


0.99989 

1.5 

0.9598 0.9800 0.9899 0.9948 0.9973 0.99859 0.99926 


0.99999 

1.6 

0.9653 0.9836 0.9920 0.9961 0.9981 0.99903 0.99951 



1.7 

0.9699 0.9864 0.9937 0.9970 0.9986 0.99933 0.99968 



1.8 

0.9737 0.9886 0.9950 0.9977 0.9990 0.99953 0.99978 



1.9 

0.9970 0.9904 0.9959 0.9983 0.9992 0.99967 0.99985 



2.0 

0.9797 0.9919 0.9967 0.9986 0.9994 0.99976 0.99990 



2.1 

0.9821 0.9931 0.9973 0.9989 0.9996 0.99983 0.99993 



2.2 

0.9841 0.9941 0.9978 0.9992 0.9997 0.99987 0.99995 



2.3 

0.9858 0.9950 0.9982 0.9993 0.9998 0.99991 0.99996 



2.4 

0.9873 0.9957 0.9985 0.9995 0.9998 0.99993 0.99997 



2.5 

0.9886 0.9963 0.9987 0.9996 0.9998 0.99995 0.99998 



2.6 

0.9898 0.9967 0.9989 0.9996 0.9999 0.99996 0.99999 



2.7 

0.9908 0.9972 0.9989 0.9997 0.9999 0.99997 0.99999 



2.8 

0.9916 0.9975 0.9989 0.9998 0.9999 0.99998 0.99999 



2.9 

0.9924 0.9978 0.9989 0.9998 0.9999 0.99998 0.99999 



3.0 

0.9931 0.9981 0.9989 0.9998 — 0.99999 — 




Explanation of Tables 

The tables give the probability that the value of the mean, measured from the mean 
of the population, in terms of the standard deviation of the sample, will lie between 
—oo and z. Thus, to take the table for samples of 6, the probability of the mean of the 
population lying between — oo and once the standard deviation of the sample is 0.9622, 
the odds are about 24 to 1 that the mean of the population lies between these limits. 

The probability is therefore 0.0378 that it is greater than once the standard deviation 
and 0.07511 that it lies outside 士 1.0 times the standard deviation. 

Illustration of Method 

Illustration I. As an instance of the kind of use which may be made of the tables, 
I take the following figures from a table by A. R. Cushny and A. R. Peebles in the 
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Journal of Physiology for 1904, showing the different effects of the optical isomers of 
hyoscyamine hydrobromide in producing sleep. The average number of hours’ sleep 
gained by the use of the drug is tabulated below. 

The conclusion arrived at was that in the usual does 2 was, but 1 was not, of value 
as a soporific. 


Additional hours，sleep gained by the use of hyoscyamine hydrobromide 


Patient 

1 (Dextro-) 

2 (Laevo-) 

Difference (2 — 1) 

1 

+0.7 

+1.9 

+1.2 

2 

-1.6 

+0.8 

+2.4 

3 

-0.2 

+1.1 

+1.3 

4 

-1.2 

+0.1 

+1.3 

5 

-0.1 

-0.1 

0 

6 

+3.4 

+4.4 

+1.0 

7 

+3.7 

+5.5 

+1.8 

8 

+0.8 

+1.6 

+0.8 

9 

0 

+4.6 

+4.6 

10 

+2.0 

+3.4 

+1.4 

Mean 

+0.75 Mean 

+2.33 

Mean +1.58 

S.D. 

0.75 S.D. 

1.90 

S.D. 1.17 


First let us see what is the probability that 1 will on the average give increase of 
sleep; i.e. what is the chance that the mean of the population of which these experi¬ 
ments are a sample is positive. +0.75/1.70 = 0.44, and looking out z = 0.44 in the 
table for ten experiments we find by interpolating between 0.8697 and 0.9161 that 0.44 
corresponds to 0.8873, or the odds are 0.887 to 0.113 that the mean is positive. 

That is about 8 to 1, and would correspond to the normal curve to about 1.8 times 
the probable error. It is then very likely that 1 gives an increase of sleep, but would 
occasion no surprise if the results were reversed by further experiments. 

If now we consider the chance that 2 is actually a soporific we have the mean 
inclrease of sleep = 2.33/1.90 or 1.23 times the S.D. From the table the probability 
corresponding to this is 0.9974, i.e. the odds are nearly 400 to 1 that such is the case. 
This corresponds to about 4.15 times the probable error in the normal curve. But I take 
it that the real point of the authors was that 2 is better than 1. This we must t4est by 
making a new series, subtracting 1 from 2. The mean values of this series is +1.38, 
while the S.D. is 1.17, the mean value being +1.35 times the S.D. From the table, the 
probability is 0.9985, or the odds are about 666 to one that 2 is the better soporific. The 
low value of the S.D. is probably due to the different drugs reacting similarly on the 
same patient, so that there is correlation between the results. 

Of course odds of this kind make it almost certain that 2 is the better soporific, and 
in practical life such a high probability is in most matters considered as a certainty. 

Illustration II. Cases where the tables will be useful are not uncommon in agricul¬ 
tural work, and they would be more numerous if the advantages of being able to apply 
statistical reasoning were borne in mind when planning the experiments. I take the 
following instances from the accounts of the Woburn farming experiments published 
yearly by Dr Voelcker in the Journal of the Agricultural Soceity. 

A short series of pot culture experiments were conducted in order to determine the 
casues which lead to the production of Hard (glutinous) wheat or Soft (starchy) wheat. 
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In three successive years a bulk of seed corn of one variety was picked over by hand 
and two samples were selected, one consisting of “hard” grains avid the other of “soft”. 
Some of each of them were planted in both heavy and light soil and the resulting crops 
wore weighed and examined for hard and soft corn. 

The conclusion drawn was that the effect of selecting the seed was negligible com¬ 
pared with the influence of the soil. 

This conclusion was thoroughly justified, theheavy soul producing in each case 
nearly 100% of hard corn, but still the effect of selecting the seed could just be traced 
in each year. 

But a curious point, to which Dr Voelcker draws attention in the second year’s 
report, is that the soft seeds produced the higher yield of both corn and straw. In 
view of the well-known fact that the varieties which have a high yield tend to produce 
soft corn, it is interesting to see how much evidence the experiments afford as to the 
correlation between softness and fertility in the same variety. 

Further, Mr Hooker** has shown that the yield of wheat in one year is largely 
determined by the weather during the preceding year. Dr Voelcker’s results may afford 
a clue as to the way in which the seed id affected, and would almost justify the selection 
of particillar soils for growing wheat •忖 

Th figures are as follows, the yields being expressed in grammes per pot: 


Year 

1899 1 1900 1 1901 1 Standard 

Soil 

Light Heavy Light Heavy Light Heavy Average deviation z 

Yield of com from soft seed 
Yield of com from hard seed 

7.55 8.89 14.81 13.55 7.49 15.39 11.328 

7.27 8.32 13.81 13.36 7.97 13.13 10.643 

Difference 

Yield of straw from soft seed 
Yield of straw from hard seed 

+0.58 +0.57 +1.00 +0.19 -0.49 +2.26 +0.685 0.778 0.88 
12.81 12.87 22.22 20.21 13.97 22.57 17.442 

10.71 12.48 21.64 20.26 11.71 18.96 15.927 

Difference 

+2.10 +0.39 +0.78 -0.05 +2.66 +3.61 +1.515 1.261 1.20 


If we wish to laid the odds that the soft seed will give a better yield of corn on the 
average, we divide, the average difference by the standard deviation, giving us 

z = 0.88. 


Looking this up in the table for n = 6 we find p = 0.9465 or the odds are 0.9465 to 
0.0535 about 18 to 1. 

Similarly for straw 2 ： = 1.20, p = 0.9782, and the odds are about 45 to 1. 

In order to see whether such odds are sufficient for a practical man to draw a definite 
conclusion, I take another act of experiments in which Dr Voelcker compares the effects 
of different artificial manures used with potatoes on a large scale. 

The figures represent the difference between the crops grown with the rise of sul¬ 
phate of potash and kailit respectively in both 1904 and 1905: 

cwt. qr. lb. ton cwt. qr. lb. ^ 

1904 + 10 3 20 : + 1 10 1 26 > (two experiments in each year) 

1905 + 603:+ 13 2 8 J 

** Journal of the Royal Statistical Society, 1897 

ttAnd perhaps a few experiments to see whether there is a correlation between yield and “mellowness” in 
barley. 
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The average gain by the use of sulphate of potash was 15.25 cwt. and the S.D. 9 
cwt., whence, if we want the odds that the conclusion given below is right, 2 ： 二 1.7, 
corresponding, when n = 4,to p = 0.9698 or odds of 32 to 1; this is midway between 
the odds in the former example. Dr Voelcker says: “It may now fairly be concluded 
that for the potato crop on light land 1 cwt. per acre of sulphate of potash is a better 
dressing than kailit.” 

Am an example of how the table should be used with caution, I take the following 
pot culture experiments to test whether it made any difference whether large or small 
seeds were sown. 

Illustration III. In 1899 and in 1903 “head com” and “tail com” were taken from 
the same bulks of barley and sown in pots. The yields in grammes were as follows: 



1899 

1903 

Large seed … 

13.9 

7.3 

Small seed … 

14.4 

1.4 


+0.5 

+1.4 


The average gain is thus 0.95 and the S.D. 0.45, giving 2 ： = 2.1. Now the table for 
n = 2 is not given, but if we look up the angle whose tangent is 2.1 in Chambers’s 


tables, 


tan -1 2.1 

v = - 

F 180° 


+ 0.5 = 


64 。 39, 
180° 


= 0.859, 


so that the odds are about 6 to 1 that small corn gives a better yield than large. These 
odds^ are those which would be laid, and laid rigidly, by a man whose only knowledge 
of the matter was contained in the two experiments. Anyone conversant with pot culture 
would however know that the difference between the two results would generally be 
greater and would correspondingly moderate the certainty of his conclusion. In point 
of fact a large-scale experiment confirmed this result, the small corn yielding shout 
15% more than the large. 

I will conclude with an example which comes beyond the range of the tables, there 
being eleven experiments. 

To test whether it is of advantage to kiln-dry barley seed before sowing, seven 
varieties of barley wore sown (both kiln-dried and not kiln-dried in 1899 and four in 
1900; the results are given in the table. 


㈡ [Through a numerical slip, now corrected. Student had given the odds as 33 to 1 and it is to this figure 
that the remarks in this paragraph relate. 
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Lb. head com per acre 
shillings per quarter 


N.K.D. N.D. Diff. 

1903 2009 +106 

1935 1915 - 20 

1910 2011 +101 

2496 2463 - 33 

2108 2180 + 72 

1961 1925 -36 

2060 2122 + 62 

1444 1482 + 38 

1612 1443 - 70 

1316 1443 +127 

1511 1535 +24 


Price of head com in Cwt. straw per acre 
in shillings 


N.K.D. N.D. Diff. N.K.D. N.D. Diff. I 
26 § \ 0 19. 25 +5 秦 

28 261 -l| 22 § 24 +1 甚 

291 28 § -1 23 24 +1 

30 29 -1 23 28 +5 


Value of crop per acre 


29* 28 会 
28 § 28 
30 29 

28 吴 28 


22 | 22 i 

4191 
24 每 22 每 
15! 16“ 

18 17^ 

14 | 15 2 
17 17^ 


Average 1841.5 1875.2 

+33.7 28.45 27.55 

-0.91 

19.95 21.05 +1.10 

145.82 144.68 +1.14 

Standard ... 
deviation 

63.1 . 

0.79 

. 2.25 

... … 6.67 

Standard 

deviation … ... 

63.1 . 

0.79 

. 2.25 

… … 6.67 


It will he noticed that the kiln-dried seed gave on an average the larger yield, of 
corn and straw, but that the quality was almost always inferior. At first sight this might 
be supposed to be due to superior germinating power in the kiln-dried seed, but my 
farming friends tell me that the effect of this would be that the kiln-dried seed would 
produce the better quality barley. Dr Voelcker draws the conclusion: “In such seasons 
as 1899 and 1900 there is no particular advantage in kiln-drying before mowing.” Our 
examination completely justifies this and adds “and the quality of the resulting barley 
is inferior though the yield may be greater.” 

In this case I propose to use the approximation given by the normal curve with 
standard deviation s/^/n — 3 and therefore use Sheppard’s tables, looking up the dif¬ 
ference divided by S/ -\/8 - The probability in the case of yield of corn per acre is given 
by looking up 33.7/22.3 = 1.51 in Sheppard’s tables. This gives p = 0.934, or the 
odds are about 14 to 1 that kiln-dried corn gives the higher yield. 

Similarly 0.91/0.28 = 3.25, corresponding to p = 0.9994 , 个 so that the odds are 
very great that kiln-dried seed gives barley of a worse quality than seed which has not 
been kiln-dried. 

Similarly, it is about 11 to 1 that kiln-dried seed gives more straw and about 2 to 1 
that the total value of the crop is less with kiln-dried seed. 

Section X. Conclusions 


1. A curve has been found representing the frequency distribution of standard 
deviations of samples drawn from a normal population. 

2. A curve has been found representing the frequency distribution of the means of 
the such samples, when these values are measured from the mean of the population in 
terms of the standard deviation of the sample. 

' As pointed out in Section V， the normal curve gives too large a value for p when the probability is large. 
I find the true value in this case to be p = 0.9976. It matters little, however, to a conclusion of this kind 
whether the odds in its favour are 1660 to 1 or merely 416 to 1. 
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3. It has been shown that the curve represents the facts fairly well even when the 
distribution of the population is not strictly normal. 

4. Tables are given by which it can be judged whether a series of experiments, 
however short, have given a result which conforms to any required standard of accuracy 
or whether it is necessary to continue the investigation. 

Finally I should like to express my thanks to Prof. Karl Pearson, without whose 
constant advice and criticism this paper could not have been written. 

[Biometrika, 6 (1908), pp. 1-25, reprinted on pp. 11-34 in “Student’s” Collected Pa¬ 
pers, Edited by E. S. Pearson and John Wishart with a Foreword by Launce McMullen, 
Cambridge University Press for the Biometrika Trustees, 1942.] 
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